Dynamic Elias-Fano Representation

نویسندگان

  • Giulio Ermanno Pibiri
  • Rossano Venturini
چکیده

We show that it is possible to store a dynamic ordered set S(n, u) of n integers drawn from a bounded universe of size u in space close to the information-theoretic lower bound and yet preserve the asymptotic time optimality of the operations. Our results leverage on the EliasFano representation of S(n, u) which takes EF(S(n, u)) = ndlog u ne+ 2n bits of space and can be shown to be less than half a bit per element away from the information-theoretic minimum. Considering a RAM model with memory words of Θ(log u) bits, we focus on the case in which the integers of S are drawn from a polynomial universe of size u = n , for any γ = Θ(1). We represent S(n, u) with EF(S(n, u)) + o(n) bits of space and: 1. support static predecessor/successor queries in O(min{1 + log u n , log logn}); 2. make S grow in an append-only fashion by spending O(1) per inserted element; 3. support random access in O(logn/ log logn) worstcase, insertions/deletions in O(logn/ log logn) amortized and predecessor/successor queries in O(min{1 + log u n , log logn}) worst-case time. These time bounds are optimal. 1998 ACM Subject Classification E.1 Data Structures, E.4 Coding and Information Theory, F.2.2 Nonnumerical Algorithms and Problems

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there arem features in total but only n = o( √ m) features are required to distinguish examples, with Ω(logm) training examples and reasonable settings, it is possible to obtain a goodmodel in a succinct representation using n log2 m n+o(m) bits, by using a pipeline of exi...

متن کامل

A Fano-Huffman Based Statistical Coding Method

Statistical coding techniques have been used for lossless statistical data compression, applying methods such as Ordinary, Shannon, Fano, Enhanced Fano, Huffman and Shannon-Fano-Elias coding methods. A new and improved coding method is presented, the Fano-Huffman Based Statistical Coding Method. It holds the advantages of both the Fano and Huffman coding methods. It is more easily applicable th...

متن کامل

Adaptive Self-Correcting Floating Point Source Coding Methodology for a Genomic Encryption Protocol

We address the problem of creating an adaptive source coding algorithm for a genomic encryption protocol using a small alphabet such as the nucleotide bases represented in the genetic code. For codewords derived from an alphabet of N plaintext with probability of occurrence, p, we describe a mapping into a floating point representation of the codewords which are translated into genomic codeword...

متن کامل

Using an innovative coding algorithm for data encryption∗

This paper discusses the problem of using data compression for encryption. We first propose an algorithm for breaking a prefix-coded file by enumeration. Based on the algorithm, we respectively analyze the complexity of breaking Huffman codes and Shannon-Fano-Elias codes under the assumption that the cryptanalyst knows the code construction rule and the probability mass function of the source. ...

متن کامل

Punctured Elias Codes for variable-length coding of the integers

The compact representation of integers is an important problem in areas such as data compression, especially where there is a nearly monotonic decrease in the likelihood of larger integers. While many different representations have been described, it is not always clear in which circumstances a particular code is to be preferred. This report introduces a variant of the Elias γ code which is sho...

متن کامل

Compressing Integers for Fast File Access

Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems and numeric data forms a large part of most databases. Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decodi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017